GE Aviation - Remaining Useful Life Analysis

Part 4 - Model Building

Author

Linh Tran

Read the Data

import pandas as pd
df = pd.read_csv("D:\School\FL 2022\ISA 401\GE\ge_data.csv")
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 100 entries, 0 to 99
Data columns (total 36 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   dataset               100 non-null    object 
 1   esn                   100 non-null    int64  
 2   unit                  100 non-null    int64  
 3   operator              100 non-null    object 
 4   last_flight_cycle     100 non-null    int64  
 5   last_datetime         100 non-null    object 
 6   mean_tra              100 non-null    int64  
 7   mean_t2               100 non-null    float64
 8   mean_t24              100 non-null    float64
 9   mean_t30              100 non-null    float64
 10  mean_t50              100 non-null    float64
 11  mean_p2               100 non-null    float64
 12  mean_p15              100 non-null    float64
 13  mean_p30              100 non-null    float64
 14  mean_nf               100 non-null    float64
 15  mean_nc               100 non-null    float64
 16  mean_epr              100 non-null    float64
 17  mean_ps30             100 non-null    float64
 18  mean_phi              100 non-null    float64
 19  mean_nrf              100 non-null    float64
 20  mean_nrc              100 non-null    float64
 21  mean_bpr              100 non-null    float64
 22  mean_farb             100 non-null    float64
 23  mean_htbleed          100 non-null    float64
 24  mean_nf_dmd           100 non-null    int64  
 25  mean_pcnfr_dmd        100 non-null    int64  
 26  mean_w31              100 non-null    float64
 27  mean_w32              100 non-null    float64
 28  mean_X44321P02_op016  100 non-null    float64
 29  mean_X44321P02_op420  100 non-null    float64
 30  mean_X54321P01_op116  100 non-null    float64
 31  mean_X54321P01_op220  100 non-null    float64
 32  mean_X65421P11_op232  100 non-null    float64
 33  mean_X65421P11_op630  100 non-null    float64
 34  total_distance        100 non-null    float64
 35  rul                   100 non-null    int64  
dtypes: float64(26), int64(7), object(3)
memory usage: 28.2+ KB

Drop unnecessary variables

Unnecessary variables are dropped as explained in the previous part:

vars_to_drop = ['dataset','esn', 'unit', 'last_datetime','mean_tra','mean_t2','mean_p2', 
                'mean_epr','mean_farb','mean_nf_dmd', 'mean_pcnfr_dmd', 'mean_p15', 'mean_t24']
df.drop(vars_to_drop, axis = 1, inplace = True)
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 100 entries, 0 to 99
Data columns (total 23 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   operator              100 non-null    object 
 1   last_flight_cycle     100 non-null    int64  
 2   mean_t30              100 non-null    float64
 3   mean_t50              100 non-null    float64
 4   mean_p30              100 non-null    float64
 5   mean_nf               100 non-null    float64
 6   mean_nc               100 non-null    float64
 7   mean_ps30             100 non-null    float64
 8   mean_phi              100 non-null    float64
 9   mean_nrf              100 non-null    float64
 10  mean_nrc              100 non-null    float64
 11  mean_bpr              100 non-null    float64
 12  mean_htbleed          100 non-null    float64
 13  mean_w31              100 non-null    float64
 14  mean_w32              100 non-null    float64
 15  mean_X44321P02_op016  100 non-null    float64
 16  mean_X44321P02_op420  100 non-null    float64
 17  mean_X54321P01_op116  100 non-null    float64
 18  mean_X54321P01_op220  100 non-null    float64
 19  mean_X65421P11_op232  100 non-null    float64
 20  mean_X65421P11_op630  100 non-null    float64
 21  total_distance        100 non-null    float64
 22  rul                   100 non-null    int64  
dtypes: float64(20), int64(2), object(1)
memory usage: 18.1+ KB
df.drop('rul', axis = 1).columns

Build Model

As mentioned in previous parts, the goal is to create a regression model. In order to accomplish this, I used PyCaret to automate the model building process.

from pycaret.regression import *
s = setup(df, target='rul', train_size = 0.9, session_id=123, remove_multicollinearity=True, multicollinearity_threshold=0.8, polynomial_features=True, feature_interaction=True, fold = 5)
  Description Value
0 session_id 123
1 Target rul
2 Original Data (100, 23)
3 Missing Values False
4 Numeric Features 21
5 Categorical Features 1
6 Ordinal Features False
7 High Cardinality Features False
8 High Cardinality Method None
9 Transformed Train Set (90, 17)
10 Transformed Test Set (10, 17)
11 Shuffle Train-Test True
12 Stratify Train-Test False
13 Fold Generator KFold
14 Fold Number 5
15 CPU Jobs -1
16 Use GPU False
17 Log Experiment False
18 Experiment Name reg-default-name
19 USI 9f1e
20 Imputation Type simple
21 Iterative Imputation Iteration None
22 Numeric Imputer mean
23 Iterative Imputation Numeric Model None
24 Categorical Imputer constant
25 Iterative Imputation Categorical Model None
26 Unknown Categoricals Handling least_frequent
27 Normalize False
28 Normalize Method None
29 Transformation False
30 Transformation Method None
31 PCA False
32 PCA Method None
33 PCA Components None
34 Ignore Low Variance False
35 Combine Rare Levels False
36 Rare Level Threshold None
37 Numeric Binning False
38 Remove Outliers False
39 Outliers Threshold None
40 Remove Multicollinearity True
41 Multicollinearity Threshold 0.800000
42 Remove Perfect Collinearity True
43 Clustering False
44 Clustering Iteration None
45 Polynomial Features True
46 Polynomial Degree 2
47 Trignometry Features False
48 Polynomial Threshold 0.100000
49 Group Features False
50 Feature Selection False
51 Feature Selection Method classic
52 Features Selection Threshold None
53 Feature Interaction True
54 Feature Ratio False
55 Interaction Threshold 0.010000
56 Transform Target False
57 Transform Target Method box-cox

Given that there were only 100 observations, the following models were considered:

  • Linear Regression

  • Lasso Regression

  • Ridge Regression

  • Elastic Net

  • Least Angle Regression

  • Lasso Least Angle Regression

best = compare_models(include=['lr', 'lasso', 'ridge','en', 'lar', 'llar'])
  Model MAE MSE RMSE R2 RMSLE MAPE TT (Sec)
lasso Lasso Regression 29.4194 1446.4202 37.4531 0.4394 0.6341 0.7641 0.4380
ridge Ridge Regression 29.6700 1467.4624 37.7583 0.4320 0.6414 0.7578 0.0060
en Elastic Net 29.9627 1468.4204 37.7039 0.4301 0.6344 0.7736 0.0080
lr Linear Regression 29.8843 1479.4645 37.9202 0.4274 0.6583 0.7648 1.0860
llar Lasso Least Angle Regression 34.7031 1716.7393 41.1517 0.3576 0.7857 1.0526 0.0060
lar Least Angle Regression 214.7651 133898.8908 262.3568 -41.2500 1.3428 7.2594 0.0080
model = create_model('lasso')
  MAE MSE RMSE R2 RMSLE MAPE
Fold            
0 30.5005 1306.2978 36.1427 0.6634 0.6453 0.8018
1 26.7571 1226.8475 35.0264 0.5233 0.6532 0.8463
2 28.2635 1168.3029 34.1804 0.4297 0.4068 0.3275
3 23.6148 997.8572 31.5889 0.6096 0.6542 0.8065
4 37.9614 2532.7959 50.3269 -0.0291 0.8111 1.0384
Mean 29.4194 1446.4202 37.4531 0.4394 0.6341 0.7641
Std 4.8219 552.5607 6.6097 0.2473 0.1295 0.2349
model = tune_model(model)
  MAE MSE RMSE R2 RMSLE MAPE
Fold            
0 32.7466 1366.0072 36.9595 0.6480 0.6996 0.9206
1 29.0010 1413.4174 37.5954 0.4509 0.6608 0.8726
2 28.5340 1232.8230 35.1116 0.3982 0.4609 0.3293
3 22.5737 999.2052 31.6102 0.6091 0.6502 0.8043
4 39.8887 2599.1195 50.9816 -0.0560 0.7829 1.0479
Mean 30.5488 1522.1145 38.4517 0.4100 0.6509 0.7949
Std 5.6942 557.3595 6.6018 0.2511 0.1058 0.2461
evaluate_model(model)
predict_model(model) ## predict on the holdout set
  Model MAE MSE RMSE R2 RMSLE MAPE
0 Lasso Regression 18.3184 451.2826 21.2434 0.7812 0.4577 0.4554
last_flight_cycle mean_nc mean_htbleed mean_X44321P02_op016 mean_X44321P02_op420 mean_X54321P01_op116 mean_X54321P01_op220 mean_X65421P11_op232 mean_X65421P11_op630 operator_AIC operator_AXM operator_FRON operator_PGT mean_X65421P11_op630_multiply_mean_X44321P02_op420 mean_X54321P01_op116_multiply_last_flight_cycle mean_htbleed_multiply_mean_nc mean_nc_multiply_last_flight_cycle rul Label
0 55.0 9050.980469 393.054535 24.049202 14.134301 30.032610 22.774393 239.792557 287.508270 0.0 0.0 1.0 0.0 4063.728516 1651.793579 3557529.00 4.978039e+05 123 107.292969
1 68.0 9066.904297 392.000000 17.068485 12.682482 28.912149 25.639204 186.075317 183.187302 0.0 0.0 1.0 0.0 2323.269531 1966.026123 3554226.50 6.165495e+05 130 118.529297
2 73.0 9057.731445 392.287659 8.047889 10.678687 33.833588 25.187513 240.154816 153.209244 0.0 0.0 0.0 1.0 1636.073608 2469.851807 3553236.25 6.612144e+05 122 120.369141
3 171.0 9059.825195 392.292389 21.018927 14.269072 22.455219 27.310680 188.682510 226.495361 0.0 0.0 1.0 0.0 3231.878418 3839.842529 3554100.50 1.549230e+06 127 87.818359
4 168.0 9057.583008 392.773804 17.294361 12.646115 33.331760 27.172123 229.029007 142.352966 0.0 1.0 0.0 0.0 1800.212036 5599.735840 3557581.25 1.521674e+06 28 38.929688
5 31.0 9049.054688 391.741943 16.257576 13.692347 23.372644 19.509785 220.605606 172.852112 0.0 0.0 0.0 1.0 2366.750977 724.552002 3544894.25 2.805207e+05 149 172.435547
6 105.0 9052.970703 392.838104 24.065445 9.905726 33.374115 17.178091 237.405304 128.142868 0.0 1.0 0.0 0.0 1269.348145 3504.281982 3556351.75 9.505619e+05 78 64.326172
7 144.0 9049.362305 392.375000 19.759495 12.905885 27.264637 22.027479 183.662613 214.633728 0.0 0.0 0.0 1.0 2770.038086 3926.107666 3550743.50 1.303108e+06 134 103.671875
8 162.0 9061.944336 392.864197 24.589872 14.170225 33.510212 19.755394 160.909058 223.617538 0.0 0.0 0.0 1.0 3168.710938 5428.654297 3560113.50 1.468035e+06 9 35.873047
9 98.0 9052.542969 392.938782 24.278660 12.311980 21.847729 26.233198 125.907394 178.445847 0.0 0.0 1.0 0.0 2197.021729 2141.077393 3557095.25 8.871492e+05 123 113.046875
final_model = finalize_model(model)
final_model
Lasso(alpha=7.73, copy_X=True, fit_intercept=True, max_iter=1000,
      normalize=False, positive=False, precompute=False, random_state=123,
      selection='cyclic', tol=0.0001, warm_start=False)

Save the Model

save_model(final_model, 'model')
Transformation Pipeline and Model Successfully Saved
(Pipeline(memory=None,
          steps=[('dtypes',
                  DataTypes_Auto_infer(categorical_features=[],
                                       display_types=True, features_todrop=[],
                                       id_columns=[], ml_usecase='regression',
                                       numerical_features=[], target='rul',
                                       time_features=[])),
                 ('imputer',
                  Simple_Imputer(categorical_strategy='not_available',
                                 fill_value_categorical=None,
                                 fill_value_numerical=None,
                                 numeric_strategy='me...
                  DFS_Classic(interactions=['multiply'], ml_usecase='regression',
                              n_jobs=-1, random_state=123, subclass='binary',
                              target='rul',
                              top_features_to_pick_percentage=None)),
                 ('pca', 'passthrough'),
                 ['trained_model',
                  Lasso(alpha=7.73, copy_X=True, fit_intercept=True,
                        max_iter=1000, normalize=False, positive=False,
                        precompute=False, random_state=123, selection='cyclic',
                        tol=0.0001, warm_start=False)]],
          verbose=False),
 'model.pkl')